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Abstract. There is a result of Diaconis and Freedman which says that, in a limiting sense, for large 
collections of high-dimensional data most one-dimensional projections of the data are approximately 
Gaussian. This paper gives quantitative versions of that result. For a set of deterministic vectors 
{xi}"^i in R'' with n and d fixed, let 6 G S''"^ be a random point of the sphere and let /i* denote the 
random measure which puts mass ^ at each of the points (a::i, 9) ,. . . , {x^, 9). For a fixed bounded 
Lipschitz test function f,Za, standard Gaussian random variable and a suitable constant, an 



explicit bound is derived for the quantity 



> e 



A bound is also given for 



P |^dBL(Mni ^(0, cr^)) > ej , where iIbl denotes the bounded-Lipschitz distance, which yields a lower 
bound on the waiting time to finding a non-Gaussian projection of the {xi} if directions are tried 
independently and uniformly on S''"^. 



1. Introduction 

A foundational tool of data analysis is the projection of high-dimensional data to a one- or 
two-dimensional subspace in order to visually represent the data, and, ideally, identify underlying 
structure. The question immediately arises: which projections are interesting? One would like to 
answer by saying that those projections which exhibit structure are interesting, however, identifying 
which projections those are is not quite as straightforward as one might think. In particular, there 
are several reasons that have led to the idea that one should mainly look for projections which are 
far from Gaussian in behavior; that Gaussian projections in fact do not generally exhibit interesting 
structure. One justification for this idea is the following result due to Persi Diaconis and David 
Freedman. 

Theorem 1 (Diaconis- Freedman [T]). Let xi, . . . ,x„ be deterministic vectors in M*^. Suppose that 
n, d and the xi depend on a hidden index v , so that as v tends to infinity, so do n and d. Suppose 
that there is a a'^ > such that, for all e > 0, 



(1) 

and suppose that 
(2) 



1 



n 



{j <n: 



a'^d\ > ed] 



{j,k <n:\ {xj,Xk) \ > ed} 



0, 



0. 



Let 9 G S*^"^ be distributed uniformly on the sphere, and consider the random measure fif, which 
puts mass ^ at each of the points {9,xi) , . . . , {9,Xn)- Then as v tends to infinity, the measures //^ 
tend to 3Nr(0,c7^) weakly in probability. 

Heuristically, Theorem[T]can be interpreted as saying that, for a large number of high-dimensional 
data vectors, as long as they have nearly the same lengths and are nearly orthogonal, most one- 
dimensional projections are close to Gaussian regardless of the structure of the data. It is important 
to note that the conditions ([1]) and ([2]) are not too strong; in particular, even though only d vectors 
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can be exactly orthogonal in W^, the 2'^ vertices of a unit cube centered at the origin satisfy condition 
([2]) for "rough orthogonality" . 

A failing of the usual interpretation of Theorem [1] is that sometimes, projections of data look 
nearly Gaussian for a reason; that is, it is not always due to the central-limit type effect described 
by the theorem. Thus the question arises: is there a way to tell whether a Gaussian projection 
is interesting? A possible answer lies in quantifying the theorem, and then saying that a nearly- 
Gaussian projection is interesting if it is "too close" to Gaussian to simply be the result of the 
phenomenon described by Theorem [TJ By way of analogy, one has the Berry-Esseen theorem 
stating that the rate of convergence to normal of the sum of n independent, identically distributed 
random variables is of the order if one has a sum of n random variables converging to Gaussian 
significantly faster, it must be happening for some reason other than just the usual central-limit 
theorem. In order to implement this idea, it is necessary (as with the Berry-Esseen theorem) to 
have a sharp quantitative version of the limit theorem in question. 

A second motivation for proving a quantitative version of Theorem [T] is the application to waiting 
times for discovering an interesting direction on which to project data. If a sequence of independent 
random projection directions is tried until the empirical distribution of the projected data is more 
than some threshhold away from Gaussian (in some metric on measures), and is the number of 
trials needed to find such a direction, a one can easily give a lower bound for EA^ from the type of 
quantitative theorem proved below. 

Thus the goal of this paper is to provide a quantitative version of Theorem [T] in a fixed dimension 
d and for a fixed number of data vectors n. To do this, it is first necessary to replace conditions 
dH) and ([2]) with non- asymptotic conditions. The conditions we will use are the following. Let cr^ 



be defined by ^ Yll=i 
(3) 

and, for all 6 £ §'^-\ 
(4) 



Xi 



a d. Suppose there exist A and B, such that 



1 " 
n ^ ' 



1 " 
n ^-^ 



d\ < A, 



< B. 



For a little perspective on the restrictiveness of these conditions, note that, as for the conditions of 
Diaconis and Preedman, they hold for the vertices of a unit cube in M*^ (with ^ = and B = j). 
Under these assumptions, the following theorems hold. 

Theorem 2. Let {xi}f^^ be deterministic vectors in subject to conditions <^ and dU above. 
For a point 9 € S"^""*^, let the measure /x^ put equal mass at each of the points {9,xi) , . . . , {9,Xn). 

Fix a test function / : M ^ M with \\f\\BL ■= ||/||oo + sup^.^^ ''^^I'x-^l^'" ^ 1- ^^^^ ^ " 
standard Gaussian random variable, 9 chosen uniformly on the sphere, a defined as above, and 

'271VI 2(A+2) ' 



e > max 



d-i 



> e 



- 2 



f{x)dfi'^{x)-Ef{aZ) 

Theorem 3. Let {xi}f^i be deterministic vectors in W^, subject to conditions ([3]) and ^ above, and 
again consider the measures 11^^. If 9 is chosen uniformly from S'^^^ and B > e> max 
then 

F\dBL{f^lJ^{0,<J^))>e 





2{A+2)\ 




' d-1 j 



ciVB 
< — TT^:^ exp 



p3/2 



C2(d-l)e 



51 



52 



with ci = A8^/^^, C2 = 3 ^2 and dsL denoting the bounded Lipschitz distance. 
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Remarks: 

(i) It should be emphasized that the key difference between the results proved here and the 
result of Diaconis and Freedman is that Theorems [2] and [3] hold for fixed dimension d and 
number of data vectors n; there are no limits in the statements of the theorems. 

(ii) It is not necessary for A and B to be absolute constants; for the the results above to be 
of interest as d — > cxd, it is easy to see from the statements that it is only necessary that 
A = o{d) and B = o{d) for Theorem [2] while B = o{y/d) for Theorem [3l The reader may 
also be wondering where the dependence on n is in the statements above; it is built into 
the definition of B. Note that, by definition, B > -^^^ for each i; in particular, B > 

It is thus necessary that n — > oo as d — > oo for Theorem [2] and n ^ \/d for Theorem [31 

(iii) For Theorem [2l consider the special case that = ^ for a large constant C . Then 
the statement becomes 

a 



f{x)df,'iix)-Kf{aZ) 



> 



Vd^ 



<./|e-- 



with C = C ■ 4V2-B. That is, roughly speaking, | J f{x)dfi^{x) — E/(crZ)| is likely to be 
on the order of or smaller. 

It is similarly useful to consider the following special case for Theorem [3l Let C > ^ , and 
consider the case = C j log(d — 1). Then the bound above becomes: 



C"B 

< — 



{d-l 



3 ' 



where C = 9 ■ l^^CB"^ and C" = 480rC"^/^°. Thus, roughly speaking, the bounded 
Lipschitz distance from the random measure /i^ to the Gaussian measure with mean zero 

and variance o"^ is unlikely to be more than a large multiple of f— tttt"^! • We make no 



d-l 

claims of the sharpness of this result. 
Theorem [3] can easily be used to give an estimate on the waiting time until a non-Gaussian 
direction is found, if directions are tried randomly and independently. Specifically, we have the 
following corollary. 

Corollary 4. Let 61,62,0^, . . . be a sequence of independent, uniformly distributed random points 
on S'^^^. Let := min{j : ^^/^(/in , !N(0, cr^) > e}. Then there are constants c, c' such that 

ET, > exp ' ^ ' 



2. Proofs 

This section is mainly devoted to the proofs of Theorems [5] and El with some additional remarks 
following the proofs. For the proof of Theorem [2l several auxiliary results are needed. The first is 
an abstract normal approximation for bounding the distance of a random variable to a Gaussian 
random variable in the presence of a continuous family of exchangeable pairs. The theorem is an 
abstraction of an idea used by Stein in [6] to bound the distance to Gaussian of the trace of a power 
of a random orthogonal matrix. 

Theorem 5 (Meckes jlj). Suppose that (W,We) is a family of exchangeable pairs defined on a 
common probability space, such that KW = and KW'^ = o"^. Let "J be a a -algebra on this space 
with a{W) C 3~. Suppose there is a function A(e) and random variables E, E' measurable with 
respect to "J, such that 
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(i) ^K[W,-W\3^] 



-W + E'. 



ii) ^W.[{W^-Wfp]j^l + E. 



2X{e) 



€-»0 



(iii) T^Eiiy, - w\^ ^-^ 0. 

Then if Z is a standard normal random variable, 

dTviW,aZ) < E\E\ + ^^E\E'\. 

The next result gives expressions for some mixed moments of entries of a Haar-distributed or- 
thogonal matrix. See |3j, Lemma 3.3 and Theorem 1.6 for a detailed proof. 

Lemma 6. If U = [uij]'^ -^-^ is an orthogonal matrix distributed according to Haar measure, then 



E 



is non-zero if and only if ri, := X^j=i fij and r,j := X]i=i ^ij '^'"^ even for each i and j. 



Second and fourth-degree moments are as follows: 
(i) For all i,j. 



E [ 



u. 



1 

d' 



(ii) For all i, j,r, s, a, (3, X, fi, 

E[uijUrsUai3U\fi] 

1 



{d-l)d{d + 2) 



+ 



d+1 



^ir^aX^j .8^13(1 + ^ia^rxSjlS^sfi + ^iX^ra^j /i^sf^ 



{d-l)did + 2) 

(iii) For the matrix Q = [qij]'^ -^^ defined by qij := UiiUj2 — Ui2Uji, and for all i,j,£,p. 



E [qijqip] 



d{d -T)['^*'^"^p 

Finally, we will need to make use of the concentration of measure on the sphere, in the form of 
the following lemma. 

Lemma 7 (Levy, see [5j). For a function F : S"^"^ — > R, let Mp denote its median with respect 
to the uniform measure (that is, for 6 distributed uniformly on , P[F(0) < M^?] > ^ and 
P[-F(0) > Mi?] > and let L denote its Lipschitz constant. Then 

id-l)e' 
2L2 



ipOji\ 



[\F{9)-Mf\ >e]< -y/-exp 



With these results, it is now possible to give the proof of Theorem [2j 

Proof of Theorem\^ The proof divides into two parts. First, an "annealed" version of the theorem 
is proved using the infinitesimal version of Stein's method given by Theorem[5l Then, for a fixed test 
function / and Z a standard Gaussian random variable, the quantity P [|/ fdji^ — E/((tZ)| > e] is 
bounded using the annealed theorem together with the concentration of measure phenomenon. 



Let be a uniformly distributed random point of S 



d-l 



c 



and let / be a uniformly distributed 



element of {1, . . . , n}, independent of 6. Consider the random variable W := {6, xj) . Then MW = 
by symmetry and KW"^ = by the condition^ Y17=i ~ ^"^^ ■ Theorem [5] will be used to bound 
the total variation distance from W to aZ, where Z is a standard Gaussian random variable. 
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The family of exchangeable pairs needed to apply the theorem is constructed as follows. For 
e > fixed, let 



vr 



Id-2 = Id + 



where 6 = 0{e^). Let U he a Haar-distributed d x d random orthogonal matrix, independent of 6 
and I, and let VF^ = (U AJj'^ 9 , x the pair (W^, 1^^) is exchangeable for each e > 0. 

Let K be the d x 2 matrix made of the first two columns of U and C2 = 



-1 



Define 



Q := KC2K^ (note that this is the same Q as in part (iii) of Theorem [6]). Then by the construction 
of VF„ 

,2 



(5) 



W,-W 



+ 6] {KK'e,xi) + e{Qe,xi) . 



The conditions of Theorem O can be verified using the expressions in Lemma E] as follows. By 
the lemma, E[KK^] = \I and E[Q] =0, and so it follows from (P that 

E \W, - W\W] = (-^ + ~~ ' 



Condition (i) of Theorem [5] is thus satisfied for A(e) = ^ and E' = 0. 
For the condition (ii) , taking 9" = a (9, 1), Lemma [6l part (iii) yields 



E[{W,-Wf\3^] =^E [{Q9,xif\3^] +0{e 



2A(e)cj^ 



2a' 



E[qijqrs9j9sXi^iXi^r\3'] + 0{e) 



i,j,r,s=l 



a^{d-l) 
1 

a^id-1) 



d d 
i,j=l i,j=l 



+ Oie) 



\xi\' -W']+0{e) 



1 + 



1 



d-1 



\XJ 



d + l 



w 



21 



+ 0(6). 



Condition (ii) of Theorem [5] is thus satisfied with E 
the theorem is trivial by (0); it follows that 



1 

d-l 



d + l 



Vl/2 



(6) dTv{W,aZ)< 



d-1 



-E 



\xi\ 



d+l 



(j2 



< 



d-1 



1 " 

-E 

n ^ 



+ 2 



. Condition (iii) of 
A + 2 



< 



d-1 



This is the annealed statement referred to at the beginning of the proof. 



We next use the concentration of measure on the sphere to show that, for a large measure of 
9 e S"^"^, the random measure //^ which puts mass ^ at each of the {9,Xi) is close to the average 
behavior. To do this, we make use of Levy's Lemma (Lemma [7|). Let / : M — > M be such that 



BL ■- 



00 + sup^_^y — ^- Consider the function F defined on the sphere by 



r 1 " 

F{9) := / f{x)df,'^{x) = -^f{{9,x.)). 
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In order to apply Lemma O it is necessary to determine the Lipschitz constant of F. Let 
9,9' eS''-^^. Then, using < 1 together with equation 



\Fie')-F{9)\ 



1=1 

i=l 

n 1 1/2 



1=1 

< \9' -9\Vb, 



thus the Lipschitz constant of F is bounded by vB. It fohows from Lemma [7] that 

P[|F(0)-Mi.|>e]<y|e~T, 

where Mp is the median of the function F. 
Now, if ^ is a random point of then 



\EF{9) - Mf\ < E\F{9) - Mf\ 

\F{9)-Mf\ > t 



dt 



(7) 



< 



e 2fl dt 



thus if e > , we may use concentration about the median of F to obtain concentration about 
the mean, with only a loss in constants. 
Note that 

EF{9) = E y /d^^ = Ef{W) 
for W = {9,xi) as above, and so by the bound 

A + 2 



\EF{9) -Ef{aZ)\ < 



d-1 



Putting these pieces together, if e > max ^^r^^, then 



1 fdfxi - EfiaZ) 


> e 


< P 






< P 






<J 



\F{9) -MF\>e- \Mf - EF{9)\ - \EF{9) - Ef{aZ)\ 



\F{9)-Mf\>- 

— e 25s 

2 



□ 



Proof of Theorem O The first two steps of the proof of Theorem [3] were essentially done already in 
the proof of Theorem [2j From that proof, we have that if = {9, xj) for 9 distributed uniformly 
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on S'^^^ and / independent of 9 and uniformly distributed in {1, . . . , n}, then 



(8) 



dTv{W,aZ) < 



A + 2 



for A as in equation ([3]). Furthermore, it fohows from equation ([7]) in the proof of Theorem [2] that 
for F{e) := J fd^l and e > then 



' [\F{e) - EF(0)| > e] < P [\F{9) - Mf\> e-\Mf - EF(0)|] 



(9) 



< 



\F{d)-MF\ >e 



2^/d^ 



TT (d-l) 9 



In this proof, this last statement is used together with a series of successive approximations of 
arbitrary bounded Lipschitz functions as used by Guionnet and Zeitouni [2j to obtain a bound for 
F[dBLif^iMO,T^))>e]. 

By definition, 



dBL{^J'i,^l^n) > e 


= P 


sup 


j fdf,i-E j fdf,^ 


> e 






\\f\\BL<l 







First consider the subclass 3^bl,k = {/ : < ^,supp{f) ^ for a compact set K C M. Let 

A = |; for / G 3'bl,k, define the approximation /a as in Guionnet and Zeitouni [2] as follows. Let 
Xo = inf K and let 



X < 0; 

X < X < A; 

A X > A. 



For X £ K, define /a recursively by /a(xo) = and 



fA{x)= Yl mf{xo + {i + l)A)>fA{xo + tA)]-l]g{x-Xo-iA). 



i=0 



That is, the function /a is just an approximation of / by a function which is piecewise linear and 
has slope 1 or —1 on each of the intervals [xo + iA, Xq + {i + 1)A]. Note that, because ||/||_bl < 1, 
it follows that \\f — /a||oo ^ A and the number of distinct functions whose linear span is used to 
approximate / in this way is bounded by where \K\ is the diameter of K. If {/ifc}^i denotes the 
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set of functions used in the approximation /a and their coefficients, then for > S'k\K\J 



sup 



> e 



< 



sup 



sup 

f&3^BL,K 



> e-2A 



k=i 



> 



< 



■ N 

E 

.k=l 



hkd^^ J hkdfij 
hkdfi^ — E / hf^dfif 



> 



> 



2N 



N 
k=l 

< \ —Ne SB \2N ) 

- V 2 

< _i ! — Lg 8s yWiJj 

~ € 

The second- last hne follows from equation ^ above, and the last line from the bound < ^1^. 

To move to the full set S^bl '■= {/ : < l}, we make a truncation argument. Given / G S'bl 

and M > 0, define /m by 

'O X < -M -\f{-M)\- 

sgn{f{-M)) [x + M + |/(-M)|] -M - |/(-M)| < x < -M; 

fhdx) = { fix) -M<x<M; 

sgn{f{M)) [|/(M)| + M - x] M<x<M+ |/(M)|; 

_0 x> M + \f{M)\; 

that is, /a/ is equal to / on [— M, M] and is drops off to zero linearly with slope 1 outside [— M, M]. 
Then, since /(x) = /a/(x) for x £ [-M,M] and \ f{x) - fMix)\ < 1 for x ^ [-M,M], 

B 



[f - ImW^ <F[\{xj,e)\>M]< —E[{xj,ef] 

Choosing M such that = f , it follows that for e^/^ > 3^^^ 



sup 




> e 


< P 


sup 


j fudfii - E y 










_fe3^BL 





< 



sup 

g^-^BL.l-M-l.M+l] 



> e 



c/(i/i„ - E / gdfi^ 

2 n2 
16(17+1) 



25 

M2 



e 

>2 



< 



p3/2 



-e 9-2'^'^B-' 



assuming that B > e. 

Recall that E / fd^i = Ef{W) for W = (6',x/), and so by the bound (HD, 

A + 2 



sup 

f&3'BL 



E y /d/i^ - E/(aZ) 



< 
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2(A+2) 
d-1 ' 



exp — 



9 • 216^2 



□ 



Proof of Corollary ^ The proof is essentially trivial. Note that 




) 



by independence of the 9j and Theorem [3l since > m if and only if dBiil^ri , 3^(0, cr^) < e for all 
1 < j < m. This bound can be used in the identity ET^ = X]m=o^[-^« ^ ""^l obtain the bound in 



Remark: One of the features of the proofs given above is that they can be generalized to the case 
of fe-dimensional projections of the d-dimensional data vectors {xi}, with k fixed or even growing 
with d. The proof of the higher-dimensional analog of Theorem [2] goes through essentially the same 
way. However, the analog of the proof of Theorem [3] from Theorem [2] is rather more involved in 
the multivariate setting and will be the subject of a future paper. 
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the corollary. 
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